Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells443426
Missing cells (%)8.3%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 89 (20.0%) missing values Age has 83 (18.6%) missing values Missing
Cabin has 353 (79.1%) missing values Cabin has 342 (76.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 299 (67.0%) zeros SibSp has 296 (66.4%) zeros Zeros
Parch has 340 (76.2%) zeros Parch has 329 (73.8%) zeros Zeros
Fare has 7 (1.6%) zeros Fare has 6 (1.3%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-12-07 16:08:43.1826642023-12-07 16:08:46.856741
Analysis finished2023-12-07 16:08:46.8557112023-12-07 16:08:50.740655
Duration3.67 seconds3.88 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean443.47534437.16592
 Dataset ADataset B
Minimum31
Maximum889890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:50.910825image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum31
5-th percentile37.2549.25
Q1235.25221.5
median439430.5
Q3658.5654.75
95-th percentile841.5841.75
Maximum889890
Range886889
Interquartile range (IQR)423.25433.25

Descriptive statistics

 Dataset ADataset B
Standard deviation253.59671251.97954
Coefficient of variation (CV)0.57183950.57639338
Kurtosis-1.1294502-1.1453682
Mean443.47534437.16592
Median Absolute Deviation (MAD)211216
Skewness-0.026667570.034909975
Sum197790194976
Variance64311.29363493.689
MonotonicityNot monotonicNot monotonic
2023-12-07T16:08:51.180087image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
604 1
 
0.2%
666 1
 
0.2%
220 1
 
0.2%
573 1
 
0.2%
308 1
 
0.2%
707 1
 
0.2%
166 1
 
0.2%
800 1
 
0.2%
727 1
 
0.2%
463 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
415 1
 
0.2%
449 1
 
0.2%
92 1
 
0.2%
165 1
 
0.2%
107 1
 
0.2%
670 1
 
0.2%
592 1
 
0.2%
772 1
 
0.2%
409 1
 
0.2%
853 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
6 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
19 1
0.2%
23 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
6 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
19 1
0.2%
23 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
280 
1
166 
0
273 
1
173 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row01
3rd row01
4th row01
5th row11

Common Values

ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%

Length

2023-12-07T16:08:51.380028image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-07T16:08:51.523062image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:51.657550image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%

Most occurring characters

ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
240 
1
103 
2
103 
3
240 
1
112 
2
94 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row32
3rd row31
4th row11
5th row32

Common Values

ValueCountFrequency (%)
3 240
53.8%
1 103
23.1%
2 103
23.1%
ValueCountFrequency (%)
3 240
53.8%
1 112
25.1%
2 94
 
21.1%

Length

2023-12-07T16:08:51.803777image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-07T16:08:51.948498image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:52.095912image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
3 240
53.8%
1 103
23.1%
2 103
23.1%
ValueCountFrequency (%)
3 240
53.8%
1 112
25.1%
2 94
 
21.1%

Most occurring characters

ValueCountFrequency (%)
3 240
53.8%
1 103
23.1%
2 103
23.1%
ValueCountFrequency (%)
3 240
53.8%
1 112
25.1%
2 94
 
21.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 240
53.8%
1 103
23.1%
2 103
23.1%
ValueCountFrequency (%)
3 240
53.8%
1 112
25.1%
2 94
 
21.1%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 240
53.8%
1 103
23.1%
2 103
23.1%
ValueCountFrequency (%)
3 240
53.8%
1 112
25.1%
2 94
 
21.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 240
53.8%
1 103
23.1%
2 103
23.1%
ValueCountFrequency (%)
3 240
53.8%
1 112
25.1%
2 94
 
21.1%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:52.560419image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5149
Mean length27.20403627.802691
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1213312400
Distinct characters6059
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowTorber, Mr. Ernst WilliamSundman, Mr. Johan Julian
2nd rowKink, Mr. VincenzRidsdale, Miss. Lucy
3rd rowKilgannon, Mr. Thomas JGoldenberg, Mr. Samuel L
4th rowGraham, Mr. George EdwardDaly, Mr. Peter Denis
5th rowDevaney, Miss. Margaret DeliaTrout, Mrs. William H (Jessie L)
ValueCountFrequency (%)
mr 261
 
14.3%
miss 90
 
4.9%
mrs 63
 
3.5%
william 35
 
1.9%
master 22
 
1.2%
john 21
 
1.2%
henry 20
 
1.1%
charles 15
 
0.8%
george 13
 
0.7%
james 12
 
0.7%
Other values (897) 1271
69.7%
ValueCountFrequency (%)
mr 253
 
13.6%
miss 87
 
4.7%
mrs 77
 
4.1%
john 28
 
1.5%
william 28
 
1.5%
master 22
 
1.2%
george 15
 
0.8%
henry 14
 
0.8%
charles 13
 
0.7%
james 12
 
0.6%
Other values (918) 1310
70.5%
2023-12-07T16:08:53.509691image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1378
 
11.4%
r 1007
 
8.3%
e 898
 
7.4%
a 821
 
6.8%
i 680
 
5.6%
n 659
 
5.4%
s 650
 
5.4%
M 559
 
4.6%
l 514
 
4.2%
o 490
 
4.0%
Other values (50) 4477
36.9%
ValueCountFrequency (%)
1414
 
11.4%
r 1021
 
8.2%
a 880
 
7.1%
e 874
 
7.0%
i 698
 
5.6%
n 658
 
5.3%
s 654
 
5.3%
M 566
 
4.6%
l 538
 
4.3%
o 516
 
4.2%
Other values (49) 4581
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7812
64.4%
Uppercase Letter 1834
 
15.1%
Space Separator 1378
 
11.4%
Other Punctuation 966
 
8.0%
Open Punctuation 68
 
0.6%
Close Punctuation 68
 
0.6%
Dash Punctuation 7
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7989
64.4%
Uppercase Letter 1865
 
15.0%
Space Separator 1414
 
11.4%
Other Punctuation 953
 
7.7%
Close Punctuation 86
 
0.7%
Open Punctuation 86
 
0.7%
Dash Punctuation 7
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1378
100.0%
ValueCountFrequency (%)
1414
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 1007
12.9%
e 898
11.5%
a 821
10.5%
i 680
8.7%
n 659
8.4%
s 650
8.3%
l 514
 
6.6%
o 490
 
6.3%
t 340
 
4.4%
h 265
 
3.4%
Other values (16) 1488
19.0%
ValueCountFrequency (%)
r 1021
12.8%
a 880
11.0%
e 874
10.9%
i 698
8.7%
n 658
8.2%
s 654
8.2%
l 538
 
6.7%
o 516
 
6.5%
t 350
 
4.4%
h 279
 
3.5%
Other values (16) 1521
19.0%
Uppercase Letter
ValueCountFrequency (%)
M 559
30.5%
A 124
 
6.8%
J 107
 
5.8%
H 103
 
5.6%
S 102
 
5.6%
C 90
 
4.9%
W 80
 
4.4%
E 79
 
4.3%
L 63
 
3.4%
B 61
 
3.3%
Other values (15) 466
25.4%
ValueCountFrequency (%)
M 566
30.3%
A 127
 
6.8%
J 112
 
6.0%
H 100
 
5.4%
S 92
 
4.9%
C 92
 
4.9%
E 83
 
4.5%
B 74
 
4.0%
L 72
 
3.9%
R 67
 
3.6%
Other values (15) 480
25.7%
Other Punctuation
ValueCountFrequency (%)
. 447
46.3%
, 446
46.2%
" 66
 
6.8%
' 6
 
0.6%
/ 1
 
0.1%
ValueCountFrequency (%)
. 447
46.9%
, 446
46.8%
" 56
 
5.9%
' 4
 
0.4%
Open Punctuation
ValueCountFrequency (%)
( 68
100.0%
ValueCountFrequency (%)
( 86
100.0%
Close Punctuation
ValueCountFrequency (%)
) 68
100.0%
ValueCountFrequency (%)
) 86
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
ValueCountFrequency (%)
- 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9646
79.5%
Common 2487
 
20.5%
ValueCountFrequency (%)
Latin 9854
79.5%
Common 2546
 
20.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1378
55.4%
. 447
 
18.0%
, 446
 
17.9%
( 68
 
2.7%
) 68
 
2.7%
" 66
 
2.7%
- 7
 
0.3%
' 6
 
0.2%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1414
55.5%
. 447
 
17.6%
, 446
 
17.5%
) 86
 
3.4%
( 86
 
3.4%
" 56
 
2.2%
- 7
 
0.3%
' 4
 
0.2%
Latin
ValueCountFrequency (%)
r 1007
 
10.4%
e 898
 
9.3%
a 821
 
8.5%
i 680
 
7.0%
n 659
 
6.8%
s 650
 
6.7%
M 559
 
5.8%
l 514
 
5.3%
o 490
 
5.1%
t 340
 
3.5%
Other values (41) 3028
31.4%
ValueCountFrequency (%)
r 1021
 
10.4%
a 880
 
8.9%
e 874
 
8.9%
i 698
 
7.1%
n 658
 
6.7%
s 654
 
6.6%
M 566
 
5.7%
l 538
 
5.5%
o 516
 
5.2%
t 350
 
3.6%
Other values (41) 3099
31.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12133
100.0%
ValueCountFrequency (%)
ASCII 12400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1378
 
11.4%
r 1007
 
8.3%
e 898
 
7.4%
a 821
 
6.8%
i 680
 
5.6%
n 659
 
5.4%
s 650
 
5.4%
M 559
 
4.6%
l 514
 
4.2%
o 490
 
4.0%
Other values (50) 4477
36.9%
ValueCountFrequency (%)
1414
 
11.4%
r 1021
 
8.2%
a 880
 
7.1%
e 874
 
7.0%
i 698
 
5.6%
n 658
 
5.3%
s 654
 
5.3%
M 566
 
4.6%
l 538
 
4.3%
o 516
 
4.2%
Other values (49) 4581
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
291 
female
155 
male
280 
female
166 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.69506734.7443946
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20942116
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalefemale
3rd rowmalemale
4th rowmalemale
5th rowfemalefemale

Common Values

ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%
ValueCountFrequency (%)
male 280
62.8%
female 166
37.2%

Length

2023-12-07T16:08:53.756258image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-07T16:08:53.919676image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:54.055095image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%
ValueCountFrequency (%)
male 280
62.8%
female 166
37.2%

Most occurring characters

ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2094
100.0%
ValueCountFrequency (%)
Lowercase Letter 2116
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 2094
100.0%
ValueCountFrequency (%)
Latin 2116
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2094
100.0%
ValueCountFrequency (%)
ASCII 2116
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7772
Distinct (%)21.6%19.8%
Missing8983
Missing (%)20.0%18.6%
Infinite00
Infinite (%)0.0%0.0%
Mean30.44165329.488044
 Dataset ADataset B
Minimum0.420.75
Maximum8074
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:54.272112image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.75
5-th percentile5.84.1
Q12120
median2928
Q33938
95-th percentile56.255.35
Maximum8074
Range79.5873.25
Interquartile range (IQR)1818

Descriptive statistics

 Dataset ADataset B
Standard deviation14.56842914.450168
Coefficient of variation (CV)0.478568920.4900348
Kurtosis0.370756410.048340959
Mean30.44165329.488044
Median Absolute Deviation (MAD)99
Skewness0.456808850.36328728
Sum10867.6710704.16
Variance212.23912208.80735
MonotonicityNot monotonicNot monotonic
2023-12-07T16:08:54.555563image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28 16
 
3.6%
22 14
 
3.1%
25 14
 
3.1%
19 14
 
3.1%
24 14
 
3.1%
18 12
 
2.7%
30 12
 
2.7%
31 11
 
2.5%
36 11
 
2.5%
21 11
 
2.5%
Other values (67) 228
51.1%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
24 15
 
3.4%
28 15
 
3.4%
22 15
 
3.4%
19 14
 
3.1%
30 14
 
3.1%
25 14
 
3.1%
36 12
 
2.7%
18 12
 
2.7%
21 12
 
2.7%
26 11
 
2.5%
Other values (62) 229
51.3%
(Missing) 83
 
18.6%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 1
 
0.2%
2 4
0.9%
3 4
0.9%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.83 2
 
0.4%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 1
 
0.2%
6 3
0.7%
7 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.83 2
 
0.4%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 1
 
0.2%
6 3
0.7%
7 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 1
 
0.2%
2 4
0.9%
3 4
0.9%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.540358740.57847534
 Dataset ADataset B
Minimum00
Maximum88
Zeros299296
Zeros (%)67.0%66.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:54.760170image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile23
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.12056721.2111406
Coefficient of variation (CV)2.07374682.0936771
Kurtosis18.40310916.227935
Mean0.540358740.57847534
Median Absolute Deviation (MAD)00
Skewness3.73926483.6121318
Sum241258
Variance1.25567091.4668615
MonotonicityNot monotonicNot monotonic
2023-12-07T16:08:54.925239image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 299
67.0%
1 108
 
24.2%
2 17
 
3.8%
4 9
 
2.0%
3 7
 
1.6%
8 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 296
66.4%
1 111
 
24.9%
2 12
 
2.7%
4 11
 
2.5%
3 8
 
1.8%
8 5
 
1.1%
5 3
 
0.7%
ValueCountFrequency (%)
0 299
67.0%
1 108
 
24.2%
2 17
 
3.8%
3 7
 
1.6%
4 9
 
2.0%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 296
66.4%
1 111
 
24.9%
2 12
 
2.7%
3 8
 
1.8%
4 11
 
2.5%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 296
66.4%
1 111
 
24.9%
2 12
 
2.7%
3 8
 
1.8%
4 11
 
2.5%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 299
67.0%
1 108
 
24.2%
2 17
 
3.8%
3 7
 
1.6%
4 9
 
2.0%
5 2
 
0.4%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.392376680.41479821
 Dataset ADataset B
Minimum00
Maximum66
Zeros340329
Zeros (%)76.2%73.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:55.081190image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q301
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)01

Descriptive statistics

 Dataset ADataset B
Standard deviation0.858969960.81868946
Coefficient of variation (CV)2.18914631.9737054
Kurtosis11.5935639.3096702
Mean0.392376680.41479821
Median Absolute Deviation (MAD)00
Skewness3.01877722.5834683
Sum175185
Variance0.737829390.67025243
MonotonicityNot monotonicNot monotonic
2023-12-07T16:08:55.240090image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 38
 
8.5%
5 4
 
0.9%
3 2
 
0.4%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 329
73.8%
1 63
 
14.1%
2 48
 
10.8%
5 2
 
0.4%
3 2
 
0.4%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 38
 
8.5%
3 2
 
0.4%
4 2
 
0.4%
5 4
 
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0 329
73.8%
1 63
 
14.1%
2 48
 
10.8%
3 2
 
0.4%
4 1
 
0.2%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 329
73.8%
1 63
 
14.1%
2 48
 
10.8%
3 2
 
0.4%
4 1
 
0.2%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 38
 
8.5%
3 2
 
0.4%
4 2
 
0.4%
5 4
 
0.9%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct384376
Distinct (%)86.1%84.3%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:55.746669image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.67264576.706278
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29762991
Distinct characters3534
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique337326 ?
Unique (%)75.6%73.1%

Sample

 Dataset ADataset B
1st row364511STON/O 2. 3101269
2nd row315151W./C. 14258
3rd row3686517453
4th rowPC 17582113055
5th row330958240929
ValueCountFrequency (%)
pc 34
 
6.0%
c.a 14
 
2.5%
ca 7
 
1.2%
a/5 7
 
1.2%
w./c 6
 
1.1%
382652 5
 
0.9%
14879 5
 
0.9%
sc/paris 5
 
0.9%
s.o.c 5
 
0.9%
f.c.c 4
 
0.7%
Other values (399) 472
83.7%
ValueCountFrequency (%)
pc 32
 
5.7%
ca 9
 
1.6%
2 8
 
1.4%
ston/o 8
 
1.4%
c.a 8
 
1.4%
a/5 7
 
1.2%
sc/paris 6
 
1.1%
w./c 5
 
0.9%
a/4 5
 
0.9%
3101295 5
 
0.9%
Other values (398) 472
83.5%
2023-12-07T16:08:56.514643image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 375
12.6%
1 330
11.1%
2 297
10.0%
7 244
8.2%
4 235
7.9%
6 209
 
7.0%
0 209
 
7.0%
5 196
 
6.6%
9 169
 
5.7%
8 144
 
4.8%
Other values (25) 568
19.1%
ValueCountFrequency (%)
3 361
12.1%
1 349
11.7%
2 296
9.9%
4 254
8.5%
7 237
7.9%
6 216
 
7.2%
0 204
 
6.8%
5 184
 
6.2%
9 166
 
5.5%
8 143
 
4.8%
Other values (24) 581
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2408
80.9%
Uppercase Letter 293
 
9.8%
Other Punctuation 144
 
4.8%
Space Separator 118
 
4.0%
Lowercase Letter 13
 
0.4%
ValueCountFrequency (%)
Decimal Number 2410
80.6%
Uppercase Letter 310
 
10.4%
Other Punctuation 135
 
4.5%
Space Separator 119
 
4.0%
Lowercase Letter 17
 
0.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 375
15.6%
1 330
13.7%
2 297
12.3%
7 244
10.1%
4 235
9.8%
6 209
8.7%
0 209
8.7%
5 196
8.1%
9 169
7.0%
8 144
 
6.0%
ValueCountFrequency (%)
3 361
15.0%
1 349
14.5%
2 296
12.3%
4 254
10.5%
7 237
9.8%
6 216
9.0%
0 204
8.5%
5 184
7.6%
9 166
6.9%
8 143
 
5.9%
Space Separator
ValueCountFrequency (%)
118
100.0%
ValueCountFrequency (%)
119
100.0%
Other Punctuation
ValueCountFrequency (%)
. 103
71.5%
/ 41
 
28.5%
ValueCountFrequency (%)
. 86
63.7%
/ 49
36.3%
Uppercase Letter
ValueCountFrequency (%)
C 86
29.4%
P 44
15.0%
A 43
14.7%
O 32
 
10.9%
S 31
 
10.6%
N 13
 
4.4%
T 11
 
3.8%
W 8
 
2.7%
F 5
 
1.7%
I 5
 
1.7%
Other values (6) 15
 
5.1%
ValueCountFrequency (%)
C 72
23.2%
P 55
17.7%
O 44
14.2%
A 39
12.6%
S 38
12.3%
T 16
 
5.2%
N 16
 
5.2%
W 10
 
3.2%
Q 5
 
1.6%
R 4
 
1.3%
Other values (5) 11
 
3.5%
Lowercase Letter
ValueCountFrequency (%)
a 4
30.8%
s 3
23.1%
i 2
15.4%
r 2
15.4%
l 1
 
7.7%
e 1
 
7.7%
ValueCountFrequency (%)
a 5
29.4%
s 4
23.5%
i 3
17.6%
r 3
17.6%
l 1
 
5.9%
e 1
 
5.9%

Most occurring scripts

ValueCountFrequency (%)
Common 2670
89.7%
Latin 306
 
10.3%
ValueCountFrequency (%)
Common 2664
89.1%
Latin 327
 
10.9%

Most frequent character per script

Common
ValueCountFrequency (%)
3 375
14.0%
1 330
12.4%
2 297
11.1%
7 244
9.1%
4 235
8.8%
6 209
7.8%
0 209
7.8%
5 196
7.3%
9 169
6.3%
8 144
 
5.4%
Other values (3) 262
9.8%
ValueCountFrequency (%)
3 361
13.6%
1 349
13.1%
2 296
11.1%
4 254
9.5%
7 237
8.9%
6 216
8.1%
0 204
7.7%
5 184
6.9%
9 166
6.2%
8 143
 
5.4%
Other values (3) 254
9.5%
Latin
ValueCountFrequency (%)
C 86
28.1%
P 44
14.4%
A 43
14.1%
O 32
 
10.5%
S 31
 
10.1%
N 13
 
4.2%
T 11
 
3.6%
W 8
 
2.6%
F 5
 
1.6%
I 5
 
1.6%
Other values (12) 28
 
9.2%
ValueCountFrequency (%)
C 72
22.0%
P 55
16.8%
O 44
13.5%
A 39
11.9%
S 38
11.6%
T 16
 
4.9%
N 16
 
4.9%
W 10
 
3.1%
a 5
 
1.5%
Q 5
 
1.5%
Other values (11) 27
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2976
100.0%
ValueCountFrequency (%)
ASCII 2991
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 375
12.6%
1 330
11.1%
2 297
10.0%
7 244
8.2%
4 235
7.9%
6 209
 
7.0%
0 209
 
7.0%
5 196
 
6.6%
9 169
 
5.7%
8 144
 
4.8%
Other values (25) 568
19.1%
ValueCountFrequency (%)
3 361
12.1%
1 349
11.7%
2 296
9.9%
4 254
8.5%
7 237
7.9%
6 216
 
7.2%
0 204
 
6.8%
5 184
 
6.2%
9 166
 
5.5%
8 143
 
4.8%
Other values (24) 581
19.4%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct182181
Distinct (%)40.8%40.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean30.86704933.45979
 Dataset ADataset B
Minimum00
Maximum263512.3292
Zeros76
Zeros (%)1.6%1.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:56.799208image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.22927.225
Q18.03448.05
median15.515.3729
Q330.392734.03125
95-th percentile112.18125112.67708
Maximum263512.3292
Range263512.3292
Interquartile range (IQR)22.358325.98125

Descriptive statistics

 Dataset ADataset B
Standard deviation41.32437450.624524
Coefficient of variation (CV)1.3387861.5129959
Kurtosis12.25991637.88923
Mean30.86704933.45979
Median Absolute Deviation (MAD)7.757.7979
Skewness3.23565875.070449
Sum13766.70414923.066
Variance1707.70392562.8424
MonotonicityNot monotonicNot monotonic
2023-12-07T16:08:57.074235image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 22
 
4.9%
13 22
 
4.9%
7.8958 21
 
4.7%
26 19
 
4.3%
7.75 16
 
3.6%
10.5 11
 
2.5%
26.55 9
 
2.0%
7.925 8
 
1.8%
7.775 7
 
1.6%
0 7
 
1.6%
Other values (172) 304
68.2%
ValueCountFrequency (%)
13 26
 
5.8%
8.05 23
 
5.2%
7.8958 16
 
3.6%
7.75 15
 
3.4%
26 15
 
3.4%
10.5 10
 
2.2%
26.55 9
 
2.0%
7.2292 7
 
1.6%
7.775 7
 
1.6%
7.225 7
 
1.6%
Other values (171) 311
69.7%
ValueCountFrequency (%)
0 7
1.6%
4.0125 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.225 4
0.9%
7.2292 6
1.3%
7.25 6
1.3%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.125 3
0.7%
7.225 7
1.6%
7.2292 7
1.6%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.125 3
0.7%
7.225 7
1.6%
7.2292 7
1.6%
ValueCountFrequency (%)
0 7
1.6%
4.0125 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.225 4
0.9%
7.2292 6
1.3%
7.25 6
1.3%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8492
Distinct (%)90.3%88.5%
Missing353342
Missing (%)79.1%76.7%
Memory size7.0 KiB7.0 KiB
2023-12-07T16:08:57.568083image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1511
Median length33
Mean length3.55913983.4326923
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters331357
Distinct characters1918
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7582 ?
Unique (%)80.6%78.8%

Sample

 Dataset ADataset B
1st rowC91C92
2nd rowC7E17
3rd rowC99B79
4th rowE40E12
5th rowF2B50
ValueCountFrequency (%)
d 2
 
1.9%
b63 2
 
1.9%
c93 2
 
1.9%
d36 2
 
1.9%
b96 2
 
1.9%
c52 2
 
1.9%
b66 2
 
1.9%
b98 2
 
1.9%
b59 2
 
1.9%
b57 2
 
1.9%
Other values (84) 87
81.3%
ValueCountFrequency (%)
d 3
 
2.6%
g6 3
 
2.6%
f 3
 
2.6%
g73 2
 
1.7%
c126 2
 
1.7%
b98 2
 
1.7%
b96 2
 
1.7%
d33 2
 
1.7%
b49 2
 
1.7%
b77 2
 
1.7%
Other values (92) 94
80.3%
2023-12-07T16:08:58.279953image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 33
 
10.0%
3 32
 
9.7%
2 30
 
9.1%
B 29
 
8.8%
6 24
 
7.3%
1 22
 
6.6%
5 21
 
6.3%
8 20
 
6.0%
9 18
 
5.4%
D 16
 
4.8%
Other values (9) 86
26.0%
ValueCountFrequency (%)
1 40
11.2%
C 34
 
9.5%
2 31
 
8.7%
B 29
 
8.1%
3 27
 
7.6%
6 22
 
6.2%
D 21
 
5.9%
8 19
 
5.3%
0 19
 
5.3%
5 19
 
5.3%
Other values (8) 96
26.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 210
63.4%
Uppercase Letter 107
32.3%
Space Separator 14
 
4.2%
ValueCountFrequency (%)
Decimal Number 227
63.6%
Uppercase Letter 117
32.8%
Space Separator 13
 
3.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 33
30.8%
B 29
27.1%
D 16
15.0%
E 13
 
12.1%
A 10
 
9.3%
F 4
 
3.7%
G 1
 
0.9%
T 1
 
0.9%
ValueCountFrequency (%)
C 34
29.1%
B 29
24.8%
D 21
17.9%
E 16
13.7%
F 6
 
5.1%
G 6
 
5.1%
A 5
 
4.3%
Decimal Number
ValueCountFrequency (%)
3 32
15.2%
2 30
14.3%
6 24
11.4%
1 22
10.5%
5 21
10.0%
8 20
9.5%
9 18
8.6%
4 16
7.6%
0 14
6.7%
7 13
6.2%
ValueCountFrequency (%)
1 40
17.6%
2 31
13.7%
3 27
11.9%
6 22
9.7%
8 19
8.4%
0 19
8.4%
5 19
8.4%
9 17
7.5%
7 17
7.5%
4 16
 
7.0%
Space Separator
ValueCountFrequency (%)
14
100.0%
ValueCountFrequency (%)
13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 224
67.7%
Latin 107
32.3%
ValueCountFrequency (%)
Common 240
67.2%
Latin 117
32.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 33
30.8%
B 29
27.1%
D 16
15.0%
E 13
 
12.1%
A 10
 
9.3%
F 4
 
3.7%
G 1
 
0.9%
T 1
 
0.9%
ValueCountFrequency (%)
C 34
29.1%
B 29
24.8%
D 21
17.9%
E 16
13.7%
F 6
 
5.1%
G 6
 
5.1%
A 5
 
4.3%
Common
ValueCountFrequency (%)
3 32
14.3%
2 30
13.4%
6 24
10.7%
1 22
9.8%
5 21
9.4%
8 20
8.9%
9 18
8.0%
4 16
7.1%
0 14
6.2%
14
6.2%
ValueCountFrequency (%)
1 40
16.7%
2 31
12.9%
3 27
11.2%
6 22
9.2%
8 19
7.9%
0 19
7.9%
5 19
7.9%
9 17
7.1%
7 17
7.1%
4 16
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 331
100.0%
ValueCountFrequency (%)
ASCII 357
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 33
 
10.0%
3 32
 
9.7%
2 30
 
9.1%
B 29
 
8.8%
6 24
 
7.3%
1 22
 
6.6%
5 21
 
6.3%
8 20
 
6.0%
9 18
 
5.4%
D 16
 
4.8%
Other values (9) 86
26.0%
ValueCountFrequency (%)
1 40
11.2%
C 34
 
9.5%
2 31
 
8.7%
B 29
 
8.1%
3 27
 
7.6%
6 22
 
6.2%
D 21
 
5.9%
8 19
 
5.3%
0 19
 
5.3%
5 19
 
5.3%
Other values (8) 96
26.9%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
322 
C
78 
Q
45 
S
325 
C
90 
Q
 
30

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowQC
4th rowSS
5th rowQS

Common Values

ValueCountFrequency (%)
S 322
72.2%
C 78
 
17.5%
Q 45
 
10.1%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 325
72.9%
C 90
 
20.2%
Q 30
 
6.7%
(Missing) 1
 
0.2%

Length

2023-12-07T16:08:58.501133image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-07T16:08:58.647729image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:58.793693image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
s 322
72.4%
c 78
 
17.5%
q 45
 
10.1%
ValueCountFrequency (%)
s 325
73.0%
c 90
 
20.2%
q 30
 
6.7%

Most occurring characters

ValueCountFrequency (%)
S 322
72.4%
C 78
 
17.5%
Q 45
 
10.1%
ValueCountFrequency (%)
S 325
73.0%
C 90
 
20.2%
Q 30
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 445
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 322
72.4%
C 78
 
17.5%
Q 45
 
10.1%
ValueCountFrequency (%)
S 325
73.0%
C 90
 
20.2%
Q 30
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 445
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 322
72.4%
C 78
 
17.5%
Q 45
 
10.1%
ValueCountFrequency (%)
S 325
73.0%
C 90
 
20.2%
Q 30
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 322
72.4%
C 78
 
17.5%
Q 45
 
10.1%
ValueCountFrequency (%)
S 325
73.0%
C 90
 
20.2%
Q 30
 
6.7%

Interactions

Dataset A

2023-12-07T16:08:45.726689image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.637129image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:43.702631image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:46.994730image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.244432image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.597162image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.734512image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:48.226246image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.219340image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.010263image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.840603image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.750487image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:43.788664image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.096059image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.334472image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.717798image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.825419image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:48.350641image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.307601image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.125617image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.973870image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.877471image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:43.967051image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.221626image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.437895image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.850256image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.923222image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:48.477015image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.406476image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.255583image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:46.108798image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:50.010787image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.065805image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.355711image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.528747image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.973340image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.027298image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:48.753403image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.507492image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.391977image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:46.235858image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:50.135345image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.157043image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:47.477692image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:44.640857image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:48.103063image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.125494image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:48.883545image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-07T16:08:45.603704image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-07T16:08:49.515618image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Missing values

Dataset A

2023-12-07T16:08:46.415501image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-12-07T16:08:50.311764image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-12-07T16:08:46.677020image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-12-07T16:08:50.568839image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
60360403Torber, Mr. Ernst Williammale44.0003645118.0500NaNS
697003Kink, Mr. Vincenzmale26.0203151518.6625NaNS
77877903Kilgannon, Mr. Thomas JmaleNaN00368657.7375NaNQ
33233301Graham, Mr. George Edwardmale38.001PC 17582153.4625C91S
444513Devaney, Miss. Margaret Deliafemale19.0003309587.8792NaNQ
15315403van Billiard, Mr. Austin Blylermale40.502A/5. 85114.5000NaNS
56156203Sivic, Mr. Huseinmale40.0003492517.8958NaNS
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
83283303Saad, Mr. AminmaleNaN0026717.2292NaNC
31831911Wick, Miss. Mary Nataliefemale31.00236928164.8667C7S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
41441513Sundman, Mr. Johan Julianmale44.0000STON/O 2. 31012697.9250NaNS
52652712Ridsdale, Miss. Lucyfemale50.0000W./C. 1425810.5000NaNS
45345411Goldenberg, Mr. Samuel Lmale49.00101745389.1042C92C
85785811Daly, Mr. Peter Denismale51.000011305526.5500E17S
39940012Trout, Mrs. William H (Jessie L)female28.000024092912.6500NaNS
32032103Dennis, Mr. Samuelmale22.0000A/5 211727.2500NaNS
50450511Maioni, Miss. Robertafemale16.000011015286.5000B79S
83183212Richards, Master. George Sibleymale0.83112910618.7500NaNS
11111203Zabour, Miss. Hilenifemale14.5010266514.4542NaNC
35335403Arnold-Franchi, Mr. Josefmale25.001034923717.8000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
747513Bing, Mr. Leemale32.000160156.4958NaNS
59459502Chapman, Mr. John Henrymale37.010SC/AH 2903726.0000NaNS
37437503Palsson, Miss. Stina Violafemale3.03134990921.0750NaNS
14714803Ford, Miss. Robina Maggie "Ruby"female9.022W./C. 660834.3750NaNS
72272302Gillespie, Mr. William Henrymale34.0001223313.0000NaNS
67667703Sawyer, Mr. Frederick Charlesmale24.5003428268.0500NaNS
40740812Richards, Master. William Rowemale3.0112910618.7500NaNS
15215303Meo, Mr. Alfonzomale55.500A.5. 112068.0500NaNS
69069111Dick, Mr. Albert Adrianmale31.0101747457.0000B20S
48848903Somerton, Mr. Francis Williammale30.000A.5. 185098.0500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
43944002Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.5000NaNS
69569602Chapman, Mr. Charles Henrymale52.00024873113.5000NaNS
37237303Beavan, Mr. William Thomasmale19.0003239518.0500NaNS
85885913Baclini, Mrs. Solomon (Latifa Qurban)female24.003266619.2583NaNC
29029111Barber, Miss. Ellen "Nellie"female26.0001987778.8500NaNS
33433511Frauenthal, Mrs. Henry William (Clara Heinsheimer)femaleNaN10PC 17611133.6500NaNS
14114213Nysten, Miss. Anna Sofiafemale22.0003470817.7500NaNS
26927011Bissette, Miss. Ameliafemale35.000PC 17760135.6333C99S
35535603Vanden Steen, Mr. Leo Petermale28.0003457839.5000NaNS
28828912Hosono, Mr. Masabumimale42.00023779813.0000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.